Folate intake and colorectal cancer risk according to genetic subtypes defined by targeted tumor sequencing

Background Folate is involved in multiple genetic, epigenetic, and metabolic processes, and inadequate folate intake has been associated with an increased risk of cancer. Objective We examined whether folate intake is differentially associated with colorectal cancer (CRC) risk according to somatic mutations in genes linked to CRC using targeted sequencing. Design Participants within 2 large CRC consortia with available information on dietary folate, supplemental folic acid, and total folate intake were included. Colorectal tumor samples from cases were sequenced for the presence of nonsilent mutations in 105 genes and 6 signaling pathways (IGF2/PI3K, MMR, RTK/RAS, TGF-β, WNT, and TP53/ATM). Multinomial logistic regression models were analyzed comparing mutated/nonmutated CRC cases to controls to compute multivariable-adjusted odds ratios (ORs) with 95% confidence interval (CI). Heterogeneity of associations of mutated compared with nonmutated CRC cases was tested in case-only analyses using logistic regression. Analyses were performed separately in hypermutated and nonhypermutated tumors, because they exhibit different clinical behaviors. Results We included 4339 CRC cases (702 hypermutated tumors, 16.2%) and 11,767 controls. Total folate intake was inversely associated with CRC risk (OR = 0.93; 95% CI: 0.90, 0.96). Among hypermutated tumors, 12 genes (AXIN2, B2M, BCOR, CHD1, DOCK3, FBLN2, MAP3K21, POLD1, RYR1, TET2, UTP20, and ZNF521) showed nominal statistical significance (P < 0.05) for heterogeneity by mutation status, but none remained significant after multiple testing correction. Among these genetic subtypes, the associations between folate variables and CRC were mostly inverse or toward the null, except for tumors mutated for DOCK3 (supplemental folic acid), CHD1 (total folate), and ZNF521 (dietary folate) that showed positive associations. We did not observe differential associations in analyses among nonhypermutated tumors, or according to the signaling pathways. Conclusions Folate intake was not differentially associated with CRC risk according to mutations in the genes explored. The nominally significant differential mutation effects observed in a few genes warrants further investigation.


Introduction
Colorectal cancer (CRC) is 1 of the most diagnosed cancers worldwide, with an estimated global incidence of >1.9 million in 2020 [1].CRC is a multifactorial disease with multiple established or putative genetic and modifiable risk factors [2].Over the past decades, a substantial body of evidence from experimental and epidemiologic studies has suggested a possible protective role for folate in CRC [3].Chemically, folate and folic acid constitute a group of compounds that possess a pterin ring conjugated to an aminobenzoate, and at least 1 glutamate moiety [4].As an essential nutrient, folate is almost exclusively provided by the diet or through supplementation mostly in the form of folic acid [5].
Observational epidemiologic studies support a possible benefit offered by dietary folate and supplemental folic acid in CRC occurrence [6].The most recent meta-analysis, which incorporated findings from 24 cohort studies, showed that high-folate intake was associated with 17% lower risk of developing CRC [7].Previous meta-analyses on supplemental folic acid and CRC risk reported conflicting associations [6,8].Great strides have been made in our understanding of the role of folate in colorectal carcinogenesis, with the most plausible pathway being folate's role in 1-carbon metabolism, DNA methylation and synthesis [9].
Unlike most CRC risk factors that do not interact directly with DNA, folate is essential for the expression of key genes, nucleotide pool balance, DNA repair and epigenetic machinery, where folate acts as a cofactor for the synthesis of purines and thymidylate [10].Suboptimal folate intake hypothetically contributes to defective DNA repair, hence promoting an accrual of gene mutations, genome instability, and higher CRC risk [11].As a methyl donor, folate has a central role in both global DNA methylation, contributing to chromosomal stability, but also to gene-specific promoter methylation [12], acting as a regulator of gene expression [13].Despite folate's role in DNA methylation and purported role in CRC development, it is unclear whether folate differentially impacts somatic genes involved in colorectal carcinogenesis and what the general implications of folate are in the genetic architecture of colorectal tumors.
Here, we sought to investigate the relationships between total folate intake, and separately dietary and supplemental folic acid, and the risk of developing CRC according to acquired somatic mutations in 105 CRC-associated genes and 6 signaling pathways.To investigate this hypothesis, we used pooled data from case-control and cohort studies within 2 international consortia with available tumor tissue samples.

Study participants
Our study population consisted of participants diagnosed with CRC (cases) and controls in the Colorectal Cancer Family Registry, and Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), with available folate data and colorectal tumor samples (for cases).The design of each study, selection of controls, ascertainment of CRC cases, and the methods of data pooling and standardization has been extensively described [14,15].Included studies and number of participants within each study are summarized in Supplemental Table 1.
The American Journal of Clinical Nutrition 120 (2024) 664-673 CRC cases were defined as individuals who were diagnosed with an incident tumor in the colon or rectum, as confirmed by onco-pathologic records or provincial or state cancer registries, and/or death certificates.

Ethical approval
Each participating study was approved by relevant ethics committees or review boards pertaining to their institutions.All participants provided informed consent at recruitment.

Data collection and harmonization
Data on socio-demographics and lifestyle were collected via inperson interviews or structured self-administered questionnaires at baseline in cohort studies.In case-control studies, socio-demographics and lifestyles were collected at enrollment of control participants and recalled at a point in time 1-2 y before diagnosis for cases.Selfreported or measured anthropometric variables such as height and weight were also collected, as were medical history and dietary assessment.A multistep, iterative data harmonization procedure was undertaken to match each study's unique protocol and collection instruments.The harmonization process was conducted centrally at the GECCO coordinating hub at the Fred Hutchinson Cancer Center, as previously described [15,16].In brief, common data elements (CDEs) for variables such as sex or age or similar variables (e.g., smoking, dietary intake variables) were defined a priori for data harmonization.Each defined CDE is then unfolded on the basis of similarity and compatibility/comparability across studies, hence allowing statistical analysis across a combined dataset.Data harmonization used a dynamic communicative and feedback approach with data contributors to map study questionnaires and data dictionaries to these CDEs.Common definitions, permissible values, and standardized coding were implemented in a single database via SAS (version 9.4, SAS Institute; RRID:SCR_008567) and T-SQL.Multiple quality-control checks were performed, and outlying values within and between the studies were truncated to the minimum or maximum value of an established range for each variable.As an example, maximum height was set to 200 cm, and any participant with reported height above this value was set to 200 cm.This approach was used to prevent outliers for becoming influential points in the analysis.

Folate status assessment
Usual diet was assessed in each participating study using foodfrequency questionnaires or diet history questionnaires.Folate intake was estimated within each study by linking food items consumption and portion sizes with nutrient databases, while accounting for the introduction of cereal grains fortification in folic acid, when applicable (e.g., in United States studies after the year 1998) [17].This was the case for the Prostate, Lung, Colorectal, & Ovarian Cancer Screening Trial.Folate intake for each participant was determined on the basis of folate content in each food item consumed (folate from foods, in μg/d), and folic acid (yes or no) was determined from dietary supplements either from single or multiple supplements.We calculated total folate (as dietary folate equivalent, in μg/d) as the sum of dietary folate and supplemental folate (intake of supplemental folic acid multiplied by a factor of 1.7 to account for higher bioavailability of folic acid compared with dietary folate) [18].To estimate supplemental folic acid, actual quantities contained in the supplements were applied when available; otherwise, standard folate content of commercially available folic acid that was assumed to be 400 μg/d was used.In United States-based studies (e.g., Women's Health Initiative), in which the recruitment of the participants spanned across folic acid fortification years (1996)(1997)(1998), folic acid from fortified foods was accounted for by including 1.7 times folic acid from fortified foods to the total sum.Before modeling, characteristics related to dietary folate and total folate intake (μg) were energy-adjusted (dividing by total energy intake) and categorized as sex-and study-specific quartiles.

Targeted sequencing
DNA extraction and targeted genome sequencing for somatic alterations was conducted as previously described elsewhere [19].In brief, tissue samples with <70% tumor content were macrodissected from slides guided by hematoxylin and eosin-staining marked for the tumor regions.DNA concentrations were determined by Quant-iT PicoGreen dsDNA Assay or the Qubit dsDNA HS Assay kits.DNA sequencing libraries were barcoded and then pooled in 48 or 192 samples for sequencing on a HiSeq 2500 (Illumina HiSeq 2500, Illumina).Low-yield samples were topped up with additional sequencing where needed.Using Burrows-Wheeler Aligner (BWA-MEM version 0.7.9), paired-end reads were aligned to the reference human genome (GRCh37/hg19).On the aligned data, local realignments and base quality recalibrations were performed.For downstream analysis, we utilized only reads that were uniquely matched to the reference human GRCh37/hg19 genome assembly.We identified somatic single nucleotide variations (SNVs) using Strelka v1.0.15 [20] and MuTect v1.1.7 [21] and used Annotate Variation (ANNOVAR) to annotate somatic mutation calls, including additional filters such as read-depth, alternative read-depth, clustered read location, strand bias, and minor allele frequency in Exome Aggregation Consortium.We plotted point mutations for all samples to determine their hypermutation status, and we noticed 2 distinct peaks.We defined hypermutation status by using the minimum value of 23-point mutations per sample (17 mutations per million bases) between the 2 peaks [19].We obtained insertion/deletion (indel) calls using majority votes from VarScan2 v2.4.349,VarDict (Feb 2017), and Strelka v1.0.1547.After initial filtering of indels on the basis of coverage and mutant allele frequency, we noticed some background signals of alternative reads in normal samples.Thus, we used read counts from tumors and normal samples to construct a background filter to remove indel calls in a subset of samples where signals were not significantly higher than background.We evaluated calls for 91 indels and 96-point mutations chosen at random using Sanger sequencing [22].Following that, we carried out a validation study employing Sequenom (Laboratory Corporation of America) as an orthogonal technology for indels and point mutations.For point mutation calls, we observed false positive and false negative rates of 0.3% and 4.1%, respectively, with a sensitivity of 95.9% and a specificity of 99.7%.
We used the data to fine-tune our calling algorithms further because the validation for indels revealed room for improvement [19].For a second validation of 109 indels, subsequent Sanger sequencing revealed 93.6% correct calls and for SNVs, we tested 84 mutations by Sanger sequencing and showed 98.8% correct calls.On the basis of ANNOVAR refGene annotations [23], we defined gene mutations as the presence of nonsilent mutations.If an SNV was annotated as exonic and nonsynonymous, stop-gain, stop-loss, or splicing, it was considered nonsilent.If an indel was annotated as exonic and included a frameshift deletion, frameshift insertion, in-frame deletion, in-frame insertion, stop-gain, or stop-loss, it was considered nonsilent.A total of 227 genes and 6 common signaling pathways (IGF2/PI3K, MMR, RTK/RAS, TGF-β, WNT, and TP53/ATM) were tested.

Statistical analyses
We computed odds ratios (ORs) with 95% confidence intervals (CIs) separately for dietary folate, supplemental folic acid, and total folate associated with CRC risk (reference category was "No" for supplemental folic acid, and 1 μg/1000 Kcal/d increment for dietary and total folate) using logistic regression in each study.We performed individual patient meta-analysis to calculate OR and 95% CI for folate variables and CRC in all participants.
The statistical analysis for the associations of folate variables and CRC by genetic subtypes is summarized in Figure 1.We used multinomial logistic regression models and pooled individual-level data to compute ORs (95% CIs) for the association between dietary folate, supplemental folic acid, and total folate and the risk of CRC according to mutated and nonmutated gene status of cases compared with control participants.In case-only analyses, logistic regression models were analyzed to examine heterogeneity of folate variables associations between mutated compared with nonmutated tumors, using P value of the differential association as P for heterogeneity.Only these P values are presented from the case-only analyses, following a consistent pattern of presentation in other relevant GECCO publications [24][25][26][27].We considered P values <0.05 as nominally significant.We conducted the analyses separately for hypermutated and nonhypermutated tumors.The rationale for the separate consideration of hypermutated and nonhypermutated tumors was motivated by the fact that hypermutated tumors exhibit a different behavior, because they are more likely to arise in right-sided colon, are less likely to be diagnosed at stage IV, and have more favorable CRC-specific survival than nonhypermutated tumors [19].Among nonhypermutated tumors, the analyses were restricted to the genes mutated in !5% of total cases, whereas in hypermutated tumors, the analyses were conducted in genes mutated in !15% of the cases.Of the 227 mutated genes in hypermutated tumors, 93 genes reached the 15% threshold and were included in our analyses, whereas in nonhypermutated tumors, 12 genes reached the threshold of 5% of mutated cases and were included in our analyses.The rationale to restrict the analyses to this set of genes was to streamline the analysis to the most important genes with sufficient statistical power.Previous investigations in GECCO demonstrated that hypermutation is driven by mutations in more genes and displayed more alterations in multiple pathways, compared with nonhypermutation status, which is driven by fewer genes [19].Therefore, for each gene, we calculated the percentage of mutated cases over nonmutated cases, separately in hypermutated and nonhypermutated tumors and retained only genes that passed the 15% and 5% threshold, respectively.
In hypermutated tumors, we did not observe any difference (heterogeneity) in the associations between folate intake and mutational FIGURE 1. Summary of the statistical analysis for the associations of folate variables and CRC by somatic genetic subtypes.The approaches used for the statistical analyses for the associations between dietary folate, supplemental folic acid, and total folate by genetic subtypes was presented.CRC, colorectal cancer.status of the evaluated genes after accounting for multiple testing (All Q-values > 0.05) (Supplemental Tables 2-4).Nevertheless, when we considered a nominal association threshold (P < 0.05), we found heterogeneity with dietary folate and 3 genes [chromodomain-helicase-DNA-binding protein 1 (CHD1); tet methylcytosine dioxygenase 2 (TET2); and zinc finger protein 521 (ZNF521)], supplemental folic acid and 4 genes [beta-2-microglobulin (B2M); dedicator of cytokinesis 3 (DOCK3); Fibulin 2 (FBLN2); polymerase delta 1 (POLD1)], and total folate with 6 genes [axin 2 (AXIN2); BCL6 corepressor (BCOR); mitogen-activated protein kinase kinase kinase 21, MAP3K21; ryanodine receptor 1 (RYR1); small subunit processome component 20 (UTP20); and ZNF521) (Table 2).The associations between folate variables and CRC risk were mostly inverse or toward the null, irrespective of the mutation status of the genes tested.However, for a few genes, folate variables were positively associated with mutated gene status, whereas inverse or null associations were found for the nonmutated genes.These were observed for 3 genes: ZNF521 (total and dietary folate), CHD1 (total folate), and DOCK3 (supplemental folic acid).The highest OR was noted for supplemental folic acid intake in relation to DOCK3-mutated CRC risk (OR: 1.88; 95% CI: 1.09, 3.25).No significant differential associations were observed for nonhypermutated tumors (Supplemental Tables 5-7).The associations with dietary folate were nonsignificant for all tested pathways in both hypermutated and nonhypermutated tumors (Table 3).Supplemental folic acid and similarly total folate showed null or inverse associations for the pathways.Overall, we did not observe any differential associations for the pathways tested.We also did not observe any significant associations of folate variables and CRC risk according to mutation burden of the tumors (hypermutated compared with nonhypermutated) (Supplemental Table 8).

Discussion
Using data from 2 large consortia, we observed inverse associations between supplementary folic acid and total folate and CRC risk.We did not observe differential associations according to mutation status in the 105 tested genes and 6 signaling pathways, after correcting for multiple testing.Nonetheless, we observed some indication of nominal significance with 1 or more folate variables being differentially associated with CRC according to somatic mutations in AXIN2, B2M, BCOR, CHD1, DOCK3, FBLN2, MAP3K21, POLD1, RYR1, TET2, UTP20, and ZNF521.The associations between folate and CRC were mostly inverse or toward the null, except for DOCK3-, CHD1-, and ZNF521mutated tumors for which we observed positive associations.
The inverse associations observed between total folate intake and CRC risk have been reported in previous epidemiologic studies and summarized in meta-analyses [8,9].Folate may mitigate genetic and epigenetic changes by preventing global DNA hypomethylation, and genome instability [29].Zsigrai et al. [30] have demonstrated that the effect of folic acid supplementation influences the genetic and epigenetic of CRC cell lines, and more importantly differentially targets several genes, hence could contribute to differential associations by molecular subtypes of the tumors.This is supported by experimental data from animal and mechanistic studies showing the relationship between folate intake and DNA repair and mutation rates in colonic tissues [10].Folate has been shown to modulate the expression of genes involved in colonic cell cycle and various signaling pathways in in vitro experimentation [31].Similar findings were also reported by Schernhammer et al. [32], who showed that folate intake was inversely associated with the incidence of long-interspersed nucleotide element-1 (LINE-1) hypomethylated colorectal tumors, whereas LINE-1-hypermutated tumors showed null associations further stressing the role of folate deficiency in specific carcinogenic mechanisms.In addition, the association between folate and CRC has been reported in cell lines analysis to be differential in DNA-repair-associated genes [33].Nevertheless, previous epidemiologic studies have not shown differential associations between folate intake and CRC risk by BRAF, or KRAS status in 3 United States cohort studies i.e., the Iowa Women's Health Study, the Nurses' Health Study and the Health Professional Follow-up Study [34,35].Overall, previous experimental studies have suggested that the role of folate in CRC may be gene-or pathway-specific, but observational epidemiologic studies did not report on these findings, either by using specific genes or exploring the associations according to broad molecular subtypes.
The genes with marginal differential associations in our study were not previously specifically associated with folate metabolism.For example, DOCK3 (previously known as modifier of cell adhesion or presenilin-binding protein), a member of the DOCK180 family of guanine nucleotide exchange factors have not been consistently reported as a significant gene in folate metabolism or colorectal carcinogenesis.Although DOCK3 is expressed in colon tissues, it is mostly known for its role in cytoskeleton organization and cell-matrix modeling [36], and as such has been reportedly investigated as a stimulator of axonal growth [37] or in conditions such as attention-deficit hyperactivity disorder [38].Nonetheless, DOCK3 has been shown to be mutated in CRC tumors and might, according to a genetic principal component analysis, serve with a panel of other genes such as CACNA1D, SERPINB4, and ZBED6 as a subtype of CRC [39].In this study, DOCK3 was mutated alone in our study, thus suggesting potential specificity of this gene to be further explored.Although it is unclear how folate differentially affects DOCK3-mutated tumors, our findings are intriguing and warrant further investigation to validate and explain the role of this gene in the folate-CRC association.
ZNF521 is a 30-zinc finger transcription cofactor with regulatory functions in the regulation of hematopoietic, adipose tissue and mesenchymal stem cells [40].This gene has been widely reported in experimental tissue and animal studies for its expression in the brain and implications in brain structure development, regulatory effect of the cerebellum development, and differentiation of striatal neurons [41].ZNF521 is expressed in some cancers such as myeloid leukemia, in which it is tagged as a potential therapeutic pathway because of its role in DNA transcription [42].ZNF521 expression has also been associated with the prognosis of pediatric neuroblastoma [43], gastric cancer, CRC [44], and ovarian cancer [40].Furthermore, it has been linked to accelerated differentiation in cell lines such as erythroid cell [45] or brain cells [46], whereas in other tissues such as adipose tissue it was reported to repress differentiation of stem cells [47].Thus, ZNF521 may act usually as a promoter or occasionally as a suppressor of transcription and cancer risk depending on the tissue and the cellular context.There exists growing evidence on the regulation of ZNF521 in several cancers, and the recent advances in folate-miRNA relationships [48] could possibly provide additional mechanistic explanation of our finding in future studies.
Another significant finding was the significant positive association observed between total folate with CHD1 in hypermutated tumors, whereas a null association was found in nonhypermutated tumors.There is no clear explanation for such differential associations and there is a scarce amount of evidence supporting the role of folate specifically related to CHD1's mutation.CHD1 belongs to the family of nucleosome remodeling ATPases, whose role is to assemble, slide, and remove nucleosomes from the DNA [49].CHD1 possesses 3 domains: N-and C-terminals that act as chromodomain pairs and DNA-binding domain, respectively, and a central ATPase motor.The latter binds to the active epigenetic methylation of histone 3 lysine 4 and participates in DNA unwrapping and increases its accessibility and transcriptional elongation [50].It is known that CHD1 is particularly expressed in microsatellite instability (MSI)-high colorectal tumors [51].Moreover, folate status has also been increasingly associated with CHD1-linked pathways in cancers such as adenomatous polyposis coli/wingless   Units: dietary folate (μg/1000 Kcal/d), supplemental folic acid (Yes vs. No), total folate (μg/1000 kcal/d) 1 P for heterogeneity was determined in case-only analysis comparing mutated cases to nonmutated cases.The results are sorted from lowest to highest P value. 2 Benjamini-Hochberg correction applied to P for heterogeneity (Q-value).
(APC/WNT) pathway [31].Furthermore, CHD1 expression loss was associated with DNA repair impairment and accumulation of DNA breaks and CHD1 is essential for the reprogramming of somatic cells [51].Taken together, our findings of a differential association with a chromatin-remodeling factor such as CHD1 [52] calls for additional studies to further understand the link between folate's impact at the gene level to histone modification, chromatin remodeling, and colorectal carcinogenesis.
The main strength of our study was its large sample size coupled with the uniquely large, carefully designed panel of genes sequenced.In addition, we took advantage of the rigorous approach to data harmonization conducted across studies and were able to include dietary and supplemental folic acid in our analysis.A major limitation of our study was that it included only participants of European ancestry; thus, our findings may not be extrapolated to other populations.Another limitation is that supplemental folic acid use was missing for approximately one-fourth (27.0%) of the participants.In addition, folate variables were collected at 1 point in time and may not take into consideration possible changes in the diet or supplement use over the years.Furthermore, some folate supplements may contain other B vitamins, which could enhance folate's physiologic actions.Finally, we did not have detailed information on cancer treatment on all the participants.Patients with rectal cancer commonly undergo neoadjuvant treatment consisting of radiotherapy with or without chemotherapy, but most participants with rectal cancer in our sample would have been diagnosed before neoadjuvant treatment became the standard of practice, and thus, neoadjuvant treatments wouldn't have impacted the mutation profile of the tumor.Units: dietary folate (μg/1000 Kcal/d), supplemental folic acid (Yes vs. No), total folate (μg/1000 kcal/d) 1 P for heterogeneity was determined in case-only analysis comparing mutated cases to nonmutated cases.The results are sorted from lowest to highest P value. 2 Benjamini-Hochberg correction applied to P for heterogeneity (Q-value).
In conclusion, in this large-scale analysis of folate intake in relation to somatic mutations in CRC, we found little evidence of differential CRC risk according to the mutational status of all the genes tested, after accounting for multiple testing.Nonetheless, we observed some nominally significant differential associations in 12 genes, including positive associations between folate intake and the risk of CHD1-, DOCK3-, and ZNF521-mutated colorectal tumors, which warrant replication in future studies.Overall, our findings demonstrated that the role of folate in CRC may be far more complex than hypothesized, and dietary and supplemental folic acid may impact CRC risk depending on specific genes.

FIGURE 2 .
FIGURE 2. Flowchart of inclusion of the participants.Number of studies and participants included in the final analytical dataset.

FIGURE 3 .
FIGURE 3. Forest plot for folate variables and colorectal cancer risk.Odds ratios and 95% confidence intervals for dietary folate, supplemental folic acid, and total folate associated with colorectal cancer for each participating study.

TABLE 1
Characteristics of the study case and control populations Abbreviations: BMI, body mass index; CRC, colorectal cancerThere are no missing values for a variable, unless otherwise specified in the table.

TABLE 2
Summary associations between folate variables and colorectal cancer risk according to mutations in specific genes in hypermutated tumors

TABLE 3
Associations between folate variables and colorectal cancer risk according to signaling pathways.